Evaluating Roget's Thesauri

نویسندگان

  • Alistair Kennedy
  • Stan Szpakowicz
چکیده

Roget’s Thesaurus has gone through many revisions since it was first published 150 years ago. But how do these revisions affect Roget’s usefulness for NLP? We examine the differences in content between the 1911 and 1987 versions of Roget’s, and we test both versions with each other and WordNet on problems such as synonym identification and word relatedness. We also present a novel method for measuring sentence relatedness that can be implemented in either version of Roget’s or in WordNet. Although the 1987 version of the Thesaurus is better, we show that the 1911 version performs surprisingly well and that often the differences between the versions of Roget’s and WordNet are not statistically significant. We hope that this work will encourage others to use the 1911 Roget’s Thesaurus in NLP tasks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of Automatic Updates of Roget's Thesaurus

abstract Keywords: lexical resources, Roget's Thesaurus, WordNet, semantic relatedness, synonym selection, pseudo-word-sense disambiguation, analogy Thesauri and similarly organised resources attract increasing interest of Natural Language Processing researchers. Thesauri age fast, so there is a constant need to update their vocabulary. Since a manual update cycle takes considerable time, autom...

متن کامل

Complementing WordNet with Roget's and Corpus-based Thesauri for Information Retrieval

This paper proposes a method to overcome the drawbacks of WordNet when applied to information retrieval by complementing it with Roget 's thesaurus and corpus-derived thesauri. Words and relations which are not included in WordNet can be found in the corpus-derived thesauri. Effects of polysemy can be minimized with weighting method considering all query terms and all of the thesauri. Experimen...

متن کامل

Automatically Expanding the Lexicon of Roget's Thesaurus

In recent years much research has been conducted on building Thesauri and enhancing them with new terms and relationships. I propose to build and evaluate a system for automatically updating the lexicon of Roget’s Thesaurus. Roget’s has been shown to lend itself well to many Natural Language Processing tasks. One of the factors limiting Roget’s use is that the only publicly available version of...

متن کامل

Roget2000: a 2D hyperbolic tree visualization of Roget's Thesaurus

Thesauri, such as Roget’s Thesaurus, show the semantic relationships among terms and concepts. Understanding these relationships can lead to a greater understanding of linguistic structure and could be applied to creating more efficient natural-language recognition and processing programs. A general assumption is that focus and context displays of hyperbolic trees accelerate browsing ability ov...

متن کامل

Automatically Expansion of Thesaurus Entries with a Different Thesaurus

We propose a method for expanding the entries in a thesaurus using a di erent thesaurus constructed with another concept. This method constructs a mapping table between the concept codes of these two di erent thesauri. Then, almost all of the entries of the latter thesaurus are assigned the concept codes of the former thesaurus with the mapping table between them. To con rm whether this method ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008